Goto

Collaborating Authors

 binary decision tree



Decision Jungles: Compact and Rich Models for Classification

Neural Information Processing Systems

Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization.



Decision Jungles: Compact and Rich Models for Classification

Neural Information Processing Systems

Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization.


CACTUS: a Comprehensive Abstraction and Classification Tool for Uncovering Structures

arXiv.org Artificial Intelligence

The availability of large data sets is providing an impetus for driving current artificial intelligent developments. There are, however, challenges for developing solutions with small data sets due to practical and cost-effective deployment and the opacity of deep learning models. The Comprehensive Abstraction and Classification Tool for Uncovering Structures called CACTUS is presented for improved secure analytics by effectively employing explainable artificial intelligence. It provides additional support for categorical attributes, preserving their original meaning, optimising memory usage, and speeding up the computation through parallelisation. It shows to the user the frequency of the attributes in each class and ranks them by their discriminative power. Its performance is assessed by application to the Wisconsin diagnostic breast cancer and Thyroid0387 data sets.


Rethink Decision Tree Traversal

arXiv.org Artificial Intelligence

QuickScorer[12] and RapidScorer[21] are proposed based on bit-vectors of the false nodes in order to speed up the additive ensemble of regression trees in learning to rank. Inspired by [12], more works, such as [2; 11; 13; 15], focus on the application and acceleration of additive tree models while we will pay attention to the theory of algorithms specially the representation of binary decision tree in the language of matrix computation. Based on so-called Tree Supervision Loss, a hierarchical classifier is built from the weights of the softmax layer in convolutional neural networks in [18]. In [20; 19], tree regularization is used to enhance the interpretability of deep neural networks. A generalized tree representation termed TART is based on transition matrix shown in [22].


Inapproximability of a Pair of Forms Defining a Partial Boolean Function

arXiv.org Artificial Intelligence

We consider the problem of jointly minimizing forms of two Boolean functions $f, g \colon \{0,1\}^J \to \{0,1\}$ such that $f + g \leq 1$ and so as to separate disjoint sets $A \cup B \subseteq \{0,1\}^J$ such that $f(A) = \{1\}$ and $g(B) = \{1\}$. We hypothesize that this problem is easier to solve or approximate than the well-understood problem of minimizing the form of one Boolean function $h: \{0,1\}^J \to \{0,1\}$ such that $h(A) = \{1\}$ and $h(B) = \{0\}$. For a large class of forms, including binary decision trees and ordered binary decision diagrams, we refute this hypothesis. For disjunctive normal forms, we show that the problem is at least as hard as MIN-SET-COVER. For all these forms, we establish that no $o(\ln (|A| + |B| -1))$-approximation algorithm exists unless P$=$NP.


The Tractability of SHAP-scores over Deterministic and Decomposable Boolean Circuits

arXiv.org Artificial Intelligence

Scores based on Shapley values are currently widely used for providing explanations to classification results over machine learning models. A prime example of this corresponds to the influential SHAP-score, a version of the Shapley value in which the contribution of a set $S$ of features from a given entity $\mathbf{e}$ over a model $M$ is defined as the expected value in $M$ of the set of entities $\mathbf{e}'$ that coincide with $\mathbf{e}$ over all features in $S$. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits, also known as tractable probabilistic circuits. Such circuits encompass a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees and Ordered Binary Decision Diagrams (OBDDs). Moreover, we establish the computational limits of the notion of SHAP-score by showing that computing it over a class of Boolean models is always (polynomially) as hard as the model counting problem for this class (under some mild condition). This implies, for instance, that computing the SHAP-score for DNF propositional formulae is a #P-hard problem, and, thus, that determinism is essential for the circuits that we consider.


Combinatorial Decision Dags: A Natural Computational Model for General Intelligence

arXiv.org Artificial Intelligence

A novel computational model (CoDD) utilizing combinatory logic to create higher-order decision trees is presented. A theoretical analysis of general intelligence in terms of the formal theory of pattern recognition and pattern formation is outlined, and shown to take especially natural form in the case where patterns are expressed in CoDD language. Relationships between logical entropy and algorithmic information, and Shannon entropy and runtime complexity, are shown to be elucidated by this approach. Extension to the quantum computing case is also briefly discussed.


An AI model for Rapid and Accurate Identification of Chemical Agents in Mass Casualty Incidents

arXiv.org Machine Learning

In this report we examine the effectiveness of WISER in identification of a chemical culprit during a chemical based Mass Casualty Incident (MCI). We also evaluate and compare Binary Decision Tree (BDT) and Artificial Neural Networks (ANN) using the same experimental conditions as WISER. The reverse engineered set of Signs/Symptoms from the WISER application was used as the training set and 31,100 simulated patient records were used as the testing set. Three sets of simulated patient records were generated by 5%, 10% and 15% perturbation of the Signs/Symptoms of each chemical record. While all three methods achieved a 100% training accuracy, WISER, BDT and ANN produced performances in the range of: 1.8%-0%, 65%-26%, 67%-21% respectively. A preliminary investigation of dimensional reduction using ANN illustrated a dimensional collapse from 79 variables to 40 with little loss of classification performance.